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ABSTRACT 



The relationship between the structure of a neural network and its ability to 
perform nonlinear mapping is analyzed. A new algorithm, called the conjugate gradient 
optimization method, for calculating the weights and thresholds of a neural network is 
presented. The performance of the conjugate gradient algorithm is then compared to the 
W’ell known backpropagation method and shown to be more computationally efficient. 
A neural network using the conjugate gradient algorithm is then applied to three simple 
examples to demonstrate its signal processing capabilities. The first example illustrates 
the ability of the neural network to perform classification. The second compares the 
performance of a one-step linear predictor to a neural network for a nonlinear chaotic- 
time series. The neural network predictor is shown to provide much greater accuracy 
than its linear counterpart. The final application presented demonstrates the ability of 
a neural network to perform channel equalization for a nonininimum phase channel. Its 
performance is then compared to its linear equivalent. 
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I. INTRODUCTION 



Artificial neural networks have been studied for many years in the hope of 
achieving human-like performance. Neural networks consist of highly connected sets 
of relatively simple processing elements. Computations are performed collectively by 
the entire network with the activity distributed over all the processing elements. This 
parallel distributed processing provides neural networks with the potential to solve 
complex problems more quickly than the currently well known present serial methods. 
The nonlinear nature and simple structure of neural networks provide a formalism 
for the study of nonlinear signal processing. 

The application of neural networks to signal processing involves developing an 
understanding of the relationship between the structure of a neural network and its 
ability to perform the desired input to-output mapping. A neural network's structure 
is defined by the number and type of processing elements in the network, the values 
of the weights that connect the processing elements together, and a threshold value 
associated' with each processing element. Past work has lead to a large variety of 
neural network models. The models include the Hopfield network. the single- and 
multi-layer perception network ■*. the reduced Coulomb energy (RCh'j classifier , and 
the adaptive resonance theory (ART) model [Ref. l:pp. 65 73]. Each model differs 
in its structure and the manner in which the weights and thresholds of the network 
are derived. One current method for calculating the weights and thresholds of a 
feedforward multilayer neural network, called the backpropagation method, uses a 
steepest descent method to iteratively adapt the weights and thresholds of the network 
t Ref. 2:p. 127 . This method has generally been shown to be slow to converge to the 
optimal set of weights and thresholds for a given problem [Ref. i:p. 300]. The 
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objectives of this thesis research were therefore: 



• Investigate the relationship between the structure of a neural network and its 
ability to perform input-output mapping. 

• Develop an alternative to the backpropagation method that converges more 
quickly to the optimal set of weights and thresholds for any given problem. 

• Compare the performance of a neural network to its linear counterpart for some 
representative signal processing applications. 

Chapter II provides a general overview of the theory of neural networks. A 
graphical approach is employed to demonstrate the ability of neural networks to 
perform nonlinear mapping for various network configurations. The results are then 
related to a theorem by Kolmogorov. The backpropagation method for calculating 
the weights and thresholds of the neural network is also introduced. 

Chapter III deals with the derivation of an alternative algorithm to the back- 
propagation method for calculating the weights and thresholds of a neural network. 
The conjugate gradient optimization method is presented and then applied to the neu- 
ral network model. The Fibonacci line search method used in conjunction with the 
conjugate gradient method is also discussed. The final section of the chapter presents 
details concerning actual implementation of the algorithm to include experimentally 
derived parameters. 

Chapter IV presents the results of the thesis research. The conjugate gradient 
algorithm’s performance is compared to the backpropagation method and is shown 
to be more computationally efficient. A neural network using the conjugate gradi- 
ent algorithm is then applied to three simple examples to validate the performance 
of the new algorithm and to demonstrate the types of tasks that a neural network 
can perform. The first example illustrates the neural network's ability to perform 
classification. A two input neural network is successfully "taught'' to differentiate be- 
tween sets ol points falling inside and outside a circle. The second example compares 



the performance of a one-step linear predictor to a neural network for a nonlinear 
chaotic time series generated using the Feigenbaum logistic function. This applica- 
tion demonstrates the nonlinear mapping ability of the neural network. The neural 
network predictor is shown to provide much greater accuracy than its linear counter- 
part. The final application presented demonstrates the ability of a neural network to 
perform channel equalization for a nonminimum phase channel. Its performance is 
compared to its linear equivalent and is shown to provide superior performance. 

Chapter V contains the overall conclusions of the thesis research and provides 
recommendations for future research. 



II. FUNDAMENTALS - HOW NEURAL 
NETWORKS WORK 



A. THE BASIC BUILDING BLOCK 

A neural network is a system of relatively simple processing elements whose 
function is determined by its network structure, connection weights, and the transfer 
function of each neuron. Figure 2.1 shows a single artificial neuron, the fundamental 

building block for all neural networks. A set of inputs Xi,x- 2 .r„ are applied 

through a set of associated connection weights Wj,w 2 w n to the neuron. 




The inputs correspond to the stimulation levels and the weights to the synap- 
tic strengths of a biological neuron. The neuron sums the weighted inputs, adds a 
threshold value, and applies the result to the neuron’s transfer function /(x). This 
operation can be expressed as 

z ~ (2.1) 
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or in vector notation 



z = f (w T x + fl) (2.2) 

where x is a column vector of inputs, w the corresponding column vector of weights, 
and 0 the neuron's threshold value. 

B. THE TRANSFER FUNCTION 

A number of possibilities arise for selection of an appropriate transfer function. 
These include most notably: the signum function, the linear function, and the sigmoid 
function. Initial research conducted in the 1950’s and 1960’s by Rosenblat, Minsky 
and others used the signum function shown in Figure 2.2 [Ref. 3]. The signum function 
will be used for a preliminary discussion of how neural networks operate. 





/(X) 






X 



Figure 2.2: Signum function 

Artificial neurons using the signum transfer function were referred to as percep- 
tions [Ref. 3]. The signum transfer function causes the output of the perception to 
take one of two discrete values. The point at which the neuron switches from low to 
high or high to low is determined by the input weights and the percept ion's thresh- 
old value. It has been shown that a single perceptron has the ability to distinguish 
between two classes of inputs [Ref. 4:p. 13]. This is demonstrated in Figure 2.3 for a 
two input network. 
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The combination of weights (u?i and W 2 ) and the offset ( 0 ) define a line where the 
output of the network ( 2 ) is high for the class of inputs falling on one side of the line 
and low for the second class of inputs falling on the other side. II there are n inputs 
to a single perceptron, as pictured in Figure 2.1, the perceptron can construct an n 
dimensional hyperplane separating the two classes of inputs. Input classes that cannot 
be separated by a simple hyperplane therefore cannot be accurately differentiated by 
a single perceptron. 

This problem can be remedied by cascading the perceptrons into several layers. 
This type of network topology is called a feedforward network because the output 
from the previous layer is fed forward to only the neurons in the next layer of the 
network. By adding additional layers, more complex boundaries can be defined. A 
two layer network is capable of defining decision regions that are convex or concave 
in shape. For the two input case shown in Figure 2.4, each perceptron in the first 
layer defines a boundary line. A single second layer perceptron weights and combines 
the outputs from the first layer perceptrons to produce the two decision regions. As 
pictured in Figure 2.4 a two layer network can also define a single enclosed region. 
With the addition of a third layer, disjoint enclosed regions can be combined to create 
a decision map of any arbitrary complexity, given a sufficient number of perceptrons 
in each layer. This is illustrated in Figure 2.5. 

The performance of a multilayer perceptron network using the signum transfer 
function is satisfactory provided the desired output from the network is limited to two 
discrete values (i.e., high or low). This would be appropriate for a binary classifier 
system, where each output would represent one of two classes, i.e., a binary value. 
It does not, however, provide sufficient resolution for analog (continuously valued) or 
the corresponding discrete valued output functions associated with most other signal 
processing applications. 
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Figure 2.3: Single neuron and associated decision regions 




Figure 2.4: Two layer network and associated decision regions 





Figure 2.5: Three layer network and associated decision regions 



One example of a transfer function that would be capable of providing such a 
continuously variable output is the linear transfer function. In this case, the output of 
the artificial neuron would simply be the weighted sum of the inputs plus the neuron’s 
threshold value. This can be expressed as 

2 = /(x) = w l x t + 0 (2.3) 

or in vector notation 

^ = /(x) = w T x + 0. (2.4) 

This is the transfer function used by Widrow and Hoff in their development of the 
adaptive linear (adaline) and multiple adaptive linear (madalirie) filters [Hof. 5:p. 
10]. A great deal has been written concerning research and applications of the 
adaptive linear filter although it has not often been referred to as a neural model 
[Ref. 6], [Ref. 7], [Ref. 8]. One key feature of the linear neural network is that there is 
no functional difference between a multilayer and a single layer network. For example, 
for the simple two layer network in Figure 2.6 the output of the first layer neurons 
can be written as 

j\(xi , £‘ 2 ) = W 1 X 1 + w 2 x 2 + 0i (2.5) 

and 

JAx i- 1?) = iWi + u'a-1'2 + O 2 . (2.6) 

The output of the network can then be written as 

fat# = ti> 5 /i( * 1 , 3 : 2 ) + W ti j2{Xi.X2) +# 3 . (2.7) 

After some algebraic manipulation and substitution, the final result is - 



— «o(U'l + ll'-jh'-l + «’&(«•_> + W 4 )x, + (iCrJJi + U'Jh + 0:s). 



( 2 - 8 ) 



From the above discussion, it is clear that, regardless of the number of layers in the 
network, the network can always be reduced to a single layer network. Essentially 
then, the linear adaptive filter is nothing more than the linear version of a single layer 
neural network. 

A third transfer function which has been recently popularized by Rumelhart et 
al. (Ref. 9] is called the sigmoid function. It is defined by the equation 

(2 - 9) 

where 

r = ^2 u 'i%* + 0 = w T x + 0. (2.10) 

The sigmoid function, pictured in Figure 2.7. has a shape which would appear to fall 
somewhere between the linear transfer function and the signum transfer function. 

Its output is limited to a continuous range of values between zero and one. For values 
of r near zero, the transfer function behaves in a linear fashion with a constant slope 
of one. If the input weights to the neuron are kept sufficiently small and the range 
of input values limited, the sigmoidal artificial neuron can be made to appear linear. 
Likewise, by using large values for the input weights re, the values for # would can- 
more rapidly and the sigmoidal artificial neuron would more closely approximate the 
signum function. As a result, the output of the network can be made to approximate 
both linear and nonlinear combinations of the inputs depending on the values of the 
network’s weights (w) and thresholds (6). 

A theorem developed by Kolmogorov and described in Reference 10 provides 
further insight into the potential capabilities of a multilayer sigmoidal neural network. 
The theorem states that any continuous function of n variables can be represented 
using only linear summations and nonlinear but continuously increasing functions of 
only one variable. This would indicate that a three layer artificial neuron feedforward 
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signum sigmoid linear 



Figure 2.7: Neuron transfer functions 
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network using a sigmoidal transfer function is capable of representing any nonlin- 
ear multivariable function. The theorem, however, does not indicate the number of 
neurons required in each layer, or how the values for the weights should be derived. 

It has been suggested that one approach to representing an 71 -dimensional non- 
linear function using neural networks might be by a weighted combination of 77- 
dimensional ‘bumps’ [Ref. 11 ], This is somewhat analogous to the Fourier series 
representation of an arbitrary signal where weighted combinations of sinusoids of 
suitable frequencies are used. To see how a nonlinear function might be represented 
using a sigmoidal neural network, let us look at the case where we have a nonlinear 
function of two variables. The output of the nonlinear function could be interpreted 
as a two dimensional surface in a three dimensional space. The output of a single 
sigmoidal neuron would have a surface like that pictured in Figure 2 . 8 . 




Figure 2.8: A sigmoid surface 

The orientation of the rising slope of the sigmoidal surface is determined by the 
neuron’s input weights (w). Its position is determined by its threshold {()) value. 
Ihe height of the surface is controlled by the weight connected to the output of (lu- 
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neuron. If we add a second neuron with the same orientation, but a slightly different 
position than the first by using a different threshold value ( 0 ), and use an output 
weight equal to but opposite in sign of the first, we can form a. ridge as shown in 
Figure 2.9. 




Figure 2.9: A ridge 



A second ridge, perpendicular to the first, can then be constructed by adding 
two additional neurons to the first layer and selecting appropriate input weight values. 
The sum of the two ridges then forms the surface pictured in Figure 2.10. The weights 
connecting the outputs of the first layer neurons to the single second layer neuron 
along with the second layer neuron’s threshold value can then be adjusted to yield a 
true bump shown in Figure 2.11. 

We can now represent any surface as a combination of these bumps. The network 
topology to accomplish this would consist of multiple copies of two layer network and 
a single third layer neuron to weight and sum the bumps. The resulting surface is 
pictured in Figure 2.12. The preceding development provides some insight into the 
number of neurons required in each layer of a neural network to adequately represent 
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Figure 2.10: A pseudo-bump 




Figure 2.11: A bump 




Figure 2.12: Multiple bumps 
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a given nonlinear function. A given function might be more efficiently represented 
using a combination of sigmoidal surfaces or ridges rather than bumps. The better 
knowledge one has of the function to be represented will lead to a better decision 
concerning the neural network topology required. 

C. CALCULATION OF WEIGHTS AND THRESHOLDS 

The burning question that has yet to be addressed concerning the feedforward 
sigmoidal neural network is how do we calculate the weights (w) and the neuron 
thresholds (9) of the network to yield a satisfactory representation of a given nonlinear 
function. A method called backpropagation. developed by Rumelhart. has proven 
popular and has been demonstrated to work fairly well [Ref. 2]. The backpropagation 
method uses a training data set consisting of sets of inputs and a desired output value. 
A set of inputs is applied to the neural network and the resulting net work out put is 
compared to the desired value. The error between the neural network’s output and 
the desired output, along with the current state of neural network, is used to modify 
the neural network s weights and threshold values. The state of the neural network is 
defined by the current input to the network, its weights, thresholds, and each neuron*® 
transfer function. The backpropagation method attempts to minimize the sum of the 
squared errors over the entire training data set. This can be expressed as 

£=yym = !>(<)--( a) 2 r-'n) 

where E is the total squared error, e(t) is the network output error for the A 1 ' input 
set. y[t) is the desired or target output for the t lh input set. and r(f) is the actual 
output of the neural net for the / th input set. The weights and the thresholds of the 
network are iteratively updated in proportion to the gradient of the total squared 
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error, E. This can be expressed as 




( 2 . 12 ) • 



and 



8(n + 1) = 9(n) + A0(n) = 0(n) - — — • e 
c9(n) 



(2.13) 



where u>(n) and 0(?i) are the weights and thresholds at the n th iteration of the algo- 
rithm. Aic(n) and A0(r?) are the incremental changes to the weights and thresholds. 



and t is the proportionality constant [Ref. 2:p. 130]. The backpropagation method 
gets its name from the fact that the error at the output of the network is propagated 
back through the network in the form of gradients in order to update the network’s 
weights and thresholds. 

The backpropagation method is essentially a steepest descent optimization al- 
gorithm which uses the gradient of the squared error function to modify the weights 
and thresholds of the neural network [Ref. 2 : p . 127]. One requirement dictated by 
this gradient method is that the transfer function of the neurons be continuously dif- 
ferentiable Ref. 2:p. 131 . As a result, this method cannot be used with the signum 
transfer function because of its discontinuity. The method, however, does work for 
the linear and sigmoidal transfer function cases. 

As presented above, the weights and thresholds are updated after a complete 
pass of the entire training data set through the network. In the actual implemen- 
tation of the algorithm, however. Rumelhart updates the weights and thresholds of 
the network after each input/desired output pair is applied [Ref. 2:pp. 136 137]. 
His rationale for doing this is that the algorithm converges so slowly that it does 
not affect the overall convergence rate, and that it is more gratifying to update the 
weights and thresholds more frequently [Ref. 2:p. 137 . As Rummelhart indicated, 
the steepest descent method is extremely slow to converge. It was this deficiency that 



led to the development of this thesis project. Lapedes and Farber indicated that a 
related optimization method, the conjugate gradient algorithm, yielded a significant 
improvement in the convergence rate of the backpropagation method [Ref. 12]. The 
following chapter will address the development and application of this optimization 
method to a feedforward sigmoidal neural network. 
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III. DERIVATION OF THE ADAPTATION 
ALGORITHM 



A THE CONJUGATE GRADIENT METHOD 
1. General Description 

The conjugate gradient method is an iterative method for optimizing a 
set of coefficients h in order to minimize a given objective function J(h). It falls 
into the class of optimization methods that apply a multidimensional search using 
derivatives to the optimization problem [Ref. 13:pp. 289-316]. The steepest descent 
method, which Rumelhart uses for adapting the feedforward neural network, is also 
a member of this class [Ref. 2]. This class of optimization methods, called gradient 
methods, treat the objective function J{ h) as a multidimensional surface over which 
it iteratively searches for the absolute or global minimum [Ref. 1 3:pp. 281) 316]. The 
coefficients h are the multidimensional coordinates which define where the algorithm 
is located on the surface during any particular iteration. This class of optimization 
methods require that the objective function be differentiable with respect to the 
coefficients h that are adapted [Ref. 13:p. 286]. This partial derivative is called the 
gradient g of the objective function. When evaluated for a given set of coefficients h, 
the gradient g is a multidimensional vector which is tangent to the objective function 
surface at a point defined by the coefficients h. This vector points in the direction of 
greatest increase. The negative of the gradient (— g) logically points downhill in the 
direction of greatest decrease. Thus, the gradient vector g can provide a direction 
along the surface of the objective function in which to search for the global minimum. 
The advantage of gradient methods is that they decompose the optimization problem 
from a multidimensional search of the objective function surface to a sequence of line 



searches along directions determined by the gradient vector g. 

The method of steepest descent uses the gradient vector g directly to per- 
form its iterative line search of the objective function surface [Ref. 14:pp. 214-220]. 
Rumelhart points out that the steepest descent method works well when the objective 
function surface is quadratic or bowl-shaped with a single global minimum [Ref. 2:p. 
132]. He states, however, that the more complex objective function surfaces associ- 
ated with multilayer neural networks frequently contain mail) - local minima [Ref. 2:p. 
132]. As a result, the steepest descent method can become trapped in one of these 
local minima yielding a less than optimal solution. This is because the magnitude of 
gradient vector decreases as the algorithm approaches a local minimum. The distance 
the steepest descent algorithm travels for a given iteration is a function of a constant 
times the magnitude of the gradient. Therefore, as the magnitude of the gradient de- 
creases. the distance the algorithm travels along the surface decreases. Compounding 
the problem is the fact that each successive gradient is orthogonal to the previous 
gradient. This causes the algorithm to zigzag in ever smaller steps as it approaches 
the bottom of a local minimum. The result is that the algorithm becomes trapped 
at the bottom of a local minimum and never reaches the optimal point or global 
minimum. Use of a constant stepsize also causes the steepest descent algorithm to be 
extremely slow to converge [Ref. 13:pp. 290-291], 

The conjugate gradient approach is motivated by a desire to accelerate 
the convergence rate of the steepest descent method without greatly increasing the 
complexity of the algorithm. The conjugate gradient method uses a succession of 
direction vectors d*. that are conjugate to the gradient vector g* obtained as the 
algorithm progresses. The direction along which the algorithm searches, d is a 
linear combination ol present and past values of the gradient vector. The result is 
that the gradient vector g A is orthogonal to the subspace I\. which is defined by 
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the set of all previous direction vectors d 0 , d 1: . . . , d k —i- Each successive iteration 
essentially adds an additional dimension to the subspace I\.. The distance a k that 
the algorithm travels along the line search direction d;. also varies for each iteration 
of the algorithm. This makes the method only slightly more complicated than the 
steepest descent method. The algorithm, however, does not become trapped in local 
minima as easily as the steepest descent method and converges steadily to the global 
minimum or optimal set of coefficients h;. [Ref. 13:pp. 297-316]. 

2. Notation Summary 

The notation used to describe the conjugate gradient method is as follows: 
7(h) Objective function to be minimized. 
hk Coefficient vector at the k tU iteration. 

g„ Gradient vectoi of the objective function at the k' u iteration, 
d^. Search direction vector at the C 1 ' iteration. 
a k Search distance coefficient at the A-' 1 ’ iteration. 

3 k Deflection coefficient at the k lh iteration. 



3. Summary of the Conjugate Gradient Algorithm 

A stitnmary of the conjugate gradient method for minimizing a differen- 
tiable objective function 7(h) is listed below [Ref. 13:p. 306]: 

Step 1 Choose an initial set of coefficients h 0 . 

Step 2. Calculate the initial gradient g 0 using the definition 



go = ' 



cJ.Jihu) 

OK ' 
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Step 3. Let the initial direction vector be d 0 = — g 0 . 



Step 4. Let k=0. 

Step 5. Let &k be the optimal solution to the problem to minimize J(h k + 

Qkdk) subject to a k > 0. 

Step 6. Update the new coefficients h^- +1 using the equation 

hfc+i = La + o*dfc. (3.2) 

Step 7. Calculate the next gradient vector value g* +1 using the new coeffi- 
cients hk+i- 

Step 8. Calculate the deflection coefficient using the equation 

3 k = (Sfc+1 ~ g ^ - g*± l . (3.3) 

8 a- 

Step 9. Update the direction vector d k +i using the equation 

dk + i = — gA-+i + (L4 ) 

Step 10. Replace k by k + 1 and go to step 5. 

4. Selection of a Line Search Method 

The conjugate gradient method outlined above requires that a search dis- 
tance coefficient a k be found that minimizes the objective function J(h* -f o d ^. ) 
subject to Q k > 0. This dictates that a line search be performed starting at the point 
in multidimensional space defined by the current coefficient vector h k . and proceeding 
along the line defined by the current direction vector d k until the minimum value of 
the objective function is found. The distance the line search algorithm travels from 
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the point to the minimum value of the function is then defined to be the scalar 
value q^.. A number of methods have been proposed to perform this line search. 
These include the uniform search, dichotomous search, the golden section method, 
and the Fibonacci method [Ref. 13:pp. 253-264]. There is also a class of line search 
methods which use derivatives to assist in finding the minimum value of the objective 
function [Ref. 13:pp. 264-269]. This second group of methods was considered for 
use with the conjugate gradient method but were subsequently rejected due to the 
complexity of calculating and evaluating the required derivatives. The selection of 
an appropriate line search method for use in conjunction with the conjugate gradient 
method was based primarily on efficiency. All of the methods except for the Fibonacci 
search require two evaluations of the objective function during each iteration of the 
algorithm. The Fibonacci method, however, requires only a single evaluation because 
it also uses the results from the previous iteration. Comparison of the line search 
methods mentioned above revealed that the Fibonacci search method is the most 
efficient [Ref. 13:p. 264]. As a result, the Fibonacci search method was chosen to be 
used in conjunction with the conjugate gradient method. 

The Fibonacci method performs a search for the minimum value of a func- 
tion of a single variable over a closed bounded interval [a, b\. The function in this 
case is J(fu- + a^cC) where aq, is the single variable. The interval over which the 
algorithm searches is called the interval of uncertainty and limits the range of values 
for q kj- The lower limit for Qq, is given by the conjugate gradient method as zero, 
but the upper limit must be specified before the algorithm can begin. The interval of 
uncertainty is steadily reduced as the algorithm progresses. The number of iterations 
which the algorithm will perform must also be specified before the start of the algo- 
rithm. The Fibonacci method is based on the Fibonacci sequence F v which is defined 
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as 



/wi »n+ -fv-. (3-5) 

F 0 = F, = 1 (3.0) 

The resulting sequence is 1,1,2,3,5,8,13,21,34,55,89,.... The Fibonacci search 
method begins by evaluating the objective function at each of two points within the 
interval of uncertainty as shown in figure 3.1. 

These two points, which we will call Xj and are calculated using 



t n _ i 

K = + j, ~{bj - aj) 



and 



F n _ 

Vi = + (tj-aj) 



(3.7) 



(3.8) 



where k is the iteration index of the conjugate gradient algorithm, j is the iteration 
index of the Fibonacci algorithm, is the current interval of uncertainty, and n 

is the total number of iterations planned. A new interval of uncert ainty |#, + ] .6 (+1 ] 
is then selected based on the value of the objective function at the two points A, and 
fij. If J(hfg -f A jdk) > J( lit + /ijd*,.). then the new interval of uncertainty [u /+1 ./> J+1 ] 
is given by | Aj,6j], Likewise, if the opposite is true. ,/( h^. + A^d^.) < J( h^. + /^d/,.). 
then the new interval of uncertainty is [a ; , /<_,]. Both cases are shown in Figure 3.2. 
The key feature that makes the Fibonacci method so attractive is that, for the next 
iteration j + 1, either A y + ] = //, or // J+1 = A^, depending on which new interval of 
uncertainty was selected. Since the objective function has already been evaluated at 
the previous values for Aj and /r.,, then only one additional evaluation must be made 
for each succeeding iteration. At the completion of the specified n iterations of the 
algorithm, the size of the final interval of uncertainty will be 



{bo - a 0 ) 



(b„-a n l 



(3.9) 






be 



»o 



ft 0 



Figure 3.1: Initial evaluation points A 0 and //. 0 and interval of uncertainty 




Figure 3.2: Evaluation points A J+] and fi }+i and revised interval of uncer- 
tainty when J( \j) > J(fij) 




Figure 3.3: Evaluation points A J+] and // J+1 and revised interval of uncer- 
tainty when J( \j) < J(fi } ) 
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If we select the midpoint of the final interval of uncertainty as the value a kn to be used 
by the conjugate gradient method, then we can calculate the number of iterations n 
required to achieve a desired accuracy after deciding upon an upper bound b u . The 
upper bound and number of iterations used for the neural network problem will be 
presented in the next chapter. 

5. Calculation of the Deflection Coefficient j3 k 

The equation used to calculate the deflection constant /4 (equation 3.3) 
is the Polak-Ribiere version of the conjugate gradient method originally proposed by 
Fletcher and Reeves [Ref. 14:p. 253]. The original method used the equation 

■4 = S ^+1 S;+1 , (3.10) 

g k Ek 

to calculate the deflection constant /4- The two equations are equivalent if the ob- 
jective function to be minimized is quadratic. Experimental results, however, tend 
to indicate that the Polak-Ribiere method is more effective for noiiquadratic objec- 
tive functions [Ref. 1 4 : p . 254]. This is because the Polak-Ribiere method tends to 
reset the the direction vector d/, +1 to the value of the gradient vector gj, +1 when 
two successive gradients g k and g^+i are equal. This has the effect of beginning the 
conjugate gradient method anew, using the present coefficients vector h/, as the new 
initial coefficient vector h 0 . 

B. APPLYING THE CONJUGATE GRADIENT METHOD TO A NEU- 
RAL NETWORK 

1. The Neural Network Model and Notation 

The generic neural network model to be used for the purposes of discussion 
is pictured in Figure 3.4. The notation used when referring to the various variables 
of the model is as follows: 
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Figure 3.4: Neural network model 



x t j The j th input to the z' th layer of the network. For other than the inputs 
Xo\,Xo 2 , . . . ,Xoi, the variable Xij is also the output of the j lh neuron in the 
(i — l) th layer and is a function of the previous layer’s inputs and weights and 
the j th neuron’s threshold value. 

Wijk The weight in the i lh layer of the network that connects the j lh input x tJ to the 
k th neuron of the layer. 

6,k The threshold value associated with the k th neuron of the z th layer of neurons. 

•y The desired output value of the network for a given set of inputs a- 0 i, x 02 x 0 i. 

/(•) The transfer function of the neuron. 



2. The Neural Network Objective Function J( h) 

As was mentioned in the previous chapter, we wish to minimize the total 
sum of the squared errors over an entire training data set. As a result, the objective 
function J{ h) to be minimized using the conjugate gradient method is 

E = Y.\t 2 (‘) e-in 

where e(t) is the error between the actual and the desired outputs of the neural 
network for the t th data set. 

3. The Adaptation Coefficients h 

There are two quantities that we wish to adapt in order for the neural 
network to consistently produce the desired output for a given input. These two 
quantities are the connection weights W t jk of the network and the threshold values 0 t k 
associated with each neuron in the network. Together, these two sets of coefficients 
form the coefficient vector h. The conjugate gradient algorithm uses a single vector 
h to represent the coefficients which are adapted to minimize the objective function 
J(h). The notation used for the neural network model, however, reflects the use of 
matrices [zc^-] for the weights and vectors [#p.] for the thresholds. This was done to 
simplify the identification of the various weights and thresholds. We must therefore 
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combine and transform the weight matrices and threshold vectors into a single vector 
h in order to apply the conjugate gradient algorithm. This is done by assigning the 
individual weights and thresholds to a vector as shown in equation 3.12. 

h = [iron. u’oi2> ■ ■ • - w oimi trxxi, • • - , W3, ^01 > $02? • • • ,0 2] ( 3 - 12 ) 



W’e can perform the conjugate gradient algorithm using the vector notation and then 
perform a reverse transformation at the completion of the algorithm to assign the 
final weights and threshold values to the neural network. 

4. The Gradient Vector g 

The gradient vector g used by the conjugate gradient method is defined as 

The gradient vector g for the neural network problem consists of the gradients asso- 
ciated with the weights and thresholds of the neural network. The gradient vector g 
would be of the form 



g = tooil -01112 0lUm-fiflIl 03-0«oi-0O o> 9t>}]‘ 



(3.14) 



The gradient for an} - particular weight or threshold of the network is calculated by 



taking the partial derivative of the error function E with respect to the particular 
weight {u\jk) or threshold (0,*). For the gradient associated with a weight this would 
be expressed as 



dE 1 d [^ 

^ = = {1 

and for the gradient associated with a threshold as 



(3-15 



SM = 



dE 

do, k 



1 d 

2 do tk 




(346) 



The partial derivative in equations 3.15 and 3.16 can be moved inside the respective 
summation terms resulting in the following expressions 



9ijk 



dE JL 2m 

' dw,j t 9 ^ An. . 6 ^ 



" dwijk 



(3.17) 



and 



2/ j \ 

9ijk 30., 2 “ 30. , 6 



(3.18) 
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The gradient for each weight can therefore be expressed as the sum of the partial 
gradients 

9Mk = Y.9 X3 kW ( :3 - 19 ) 



where the partial gradient g[ jk {t) is the gradient associated with the weight u\ jk . when 
evaluated for a single set of training data rather than the entire training data set. 1 he 
gradients associated with the threshold values of the neural network can be expressed 
in a similar manner, given by 

= E»U ( >- (3.-20) 

t 

For the purposes of notational brevity, we will assume that the training data set 
consists of only one set inputs and the associated desired output. T his will allow 
us reduce the length of equations for the gradient by removing references to the 
particular element of the training set used. The reader should remember* however, 
that if there are s pairs of data in the training set, then the gradient is the sum of 
the s partial gradients as expressed in equation 3.19 and equation 3.20. 
a. Neuron Transfer Function Derivative 

Before delving into the derivation of the equations for the gradients 
of the weights and thresholds of the neural network, a few comments should be made 
concerning the transfer function used for the neural network model and its derivative. 
The transfer function to be used is the sigmoid function defined by equation 2.9 



in Chapter 2. A key feature of the sigmoidal function is that its derivative can be 
expressed in terms of its original value by 



£/(*) = -/(*)(!-/(*))• (3-21) 

The derivative of a neuron’s output can thus be expressed as a function of the output 
of the neuron and the partial derivative of the neuron’s inputs. The partial derivative 
of the neuron’s output with respect to w ljk is then given by 

^ = (3-22) 

and 

%r = (_x,+,j)tt " '-■‘•sb ('■* ■ ? {vn] 

for the derivative with respect to 0 lk . Equations i. 22 and 5.25 will be used frequently 
to evaluate the partial derivatives of each neuron's output when deriving the equations 
for the gradients of the neural network. 

b. Calculation of the Third Layer Gradient 

The calculation of the gradients for each weight and threshold of the 
neural network begins at the output of the neural network where the difference be- 
tween the actual network output and the desired output produces an error. This 
error is propagated back through the network in the form of gradients. The gradient 
associated with the output weight w 3 can be expressed as 



5s : 



OE 



d 1 



d 



— = = — (y — w 3 x 3 

u’t dw-t 2 dw-*2 



dw 3 dw 3 2 0w 3 2 

where w 3 .r 3 is the output of the network and y is the desired output value, 
the partial derivative yields 



(3.24) 

Taking 



9-i — (y tr 3 a- ; 



) tj — (y - u-3*a) = (y - 

dm 



te a .r 3 ) ( -.rj) . 



(3.25) 
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After rearranging the terms of equation 3.25, the final form for the output weight’s 
gradient g 3 becomes 

g 3 = (w 3 x 3 - y)x 3 . (3.26) 

c. Calculation of the Second Layer Gradients 

Derivation of the input, first and second layer gradients is somewhat 
more involved than that of the third layer gradients because of the multiple neurons 
and weights between the error at the output and the gradient for which we are deriving 
an expression. The gradient equation for a weight in the second layer can be expressed 
as 

92j = = {y ~ w 3 a- 3 ) {y ~ W 3 X 3 ) • (3.27) 

UW 2j OU)2j 

Of the terms evaluated by the partial derivative, only the output of the third layer 
neuron ;r 3 is affected by a variation of the second layer weight ic 2r The desired output 
y can be eliminated and the partial derivative shifted to the right of the output weight 
term w 3 . This yields the expression 

9-ij =(y ~ w 3 r 3 ) (~w 3 ) ~~~ (ar 3 ) • (3.28) 

dw 2j 

We can replace the partial derivative term in equation 3.28 with an equivalent ex- 
pression that can be evaluated with respect to w 2l using equation 3.22. This results 
in the following expression 

92 j = [y - w 3 x 3 ) {-x 3 ) {-w 3 ) (1 - ar 3 ) ~ ]T w 2 p x 2 )j ■ (3.29) 

93 p 

Comparing the first part of equation 3.29 with equation 3.26, we find that we can 
replace the first two terms of equation 3.29 with the output weight’s gradient g 3 . 
After taking the partial derivative, only one term, x 2j . remains. The equation for the 
second layer weight gradient becomes 

92 - 93 (-«' 3 ) (1 - x 3 ){-x 2j ) = g 3 U' 3 (1 - x 3 )x- 2j . (3.30) 
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We can see from equation 3.30 that the gradient g 2j is a function of weight w 3 that 
connects the neuron’s output to the next layer, the gradient g 3 that is associated with 
the output weight, the neuron’s output value x 3 , and the input x 2 j that is applied to 
the weight for which we are calculating the gradient (g 2} ). This relationship between 
the inputs, outputs, weights and gradients will be found to be consistent for each of 
the gradients of the neural network. 



dient associated with the output neuron’s threshold 0 2 . we begin at the point where 
evaluation of the partial derivative with respect to 0 2 differs from that for the weight 
gradient g 2y The equation for the gradient of the output neuron's threshold becomes 



Evaluation of the partial derivative yields a constant of one since none of the sum- 
mation terms is a function of the threshold value 0 2 . Shifting the sign term, the final 
form for the gradient is 



Note that the equation for the gradient of the neuron’s threshold value 0 2 (equation 
3.32« has the same form as that for the input weights w 2 , connected to the 1 output 
neuron (equation 3.29) except for the input term .r 2j . We can treat the threshold 
value as a weight if we assume that the threshold 'weight' has a constant input of —1. 
d. Calculation of the First Layer Gradients 

The derivation of the equation for the gradient of the first layer weights 
follows in a similar fashion to that of the second layer. We begin at the point where 
evaluation of the partial derivative differs (equation 3.29). The equation for the first 



Rather than starting from scratch to derive the equation for the gra- 




go. = ~y: jtt’ 3 (l - J’-i) • 



(3.32) 
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(3.33) 



layer weight gradient becomes 

a '> k = £ = ■ 93 <- W3) (I " X3) d^{° 2 ~^ W2rX2r ) ' 

Only the output of the k lh neuron in the second layer (a^t) is affected by the value of 
the weight Wjjk of the first layer. Therefore all terms except for the A~ lh term of the 
summation in equation 3.33 are zero when the partial derivative is taken. This yields 
the expression 

9ijk = -93^3 (1 - a- 3 ) (—W 2 k) (3.34) 

Using equation 3.22 we can rewrite equation 3.34 as 

9ijk = ~{J3 U '3 ( 1 — a.j ) ( - .r 2k ) ( ~ w 2 k ) (1 - &2k} _ H • (3.35) 

92 k 

The first part of equation 3.35 can be replaced with the gradient g 2 k using equation 
3.30. Only the j th term of the summation under evaluation by the partial derivative 
with respect to w\jk is nonzero. The equation for the first layer gradients of the 
weights then becomes 

9\jk = 92k ( — u’2A-) (1 — a 2 k) (~t'ij ) (3.3(0 

which when rearranged yields 

9ijk = gtkU'ik (1 - Z2k).v lr (3.37) 

Again, the present layers's gradient is a function of the next layer's gradients and 
weights, the present layer’s neuron output values, and the input to the present layer. 

The derivation of the equation for the gradient associated with the 
neuron thresholds of the first layer follows in the same manner as that of the second 
layer. The equation for the threshold gradients of the first layer can be expressed 



Evaluating the partial derivative results in the final equation 



9e lk = - %2k) 



(3.39) 



which has the same form as equation 3.32. 

e. Calculation of the Input Layer Gradients 

Derivation of the input layer’s gradient equation differs only slightly 
from the previous development. The difference is due to the fact that a variation in 
the value of a weight in the input layer affects the output of more than a single neuron 
in the next layer of the network. This means that we must retain a summation term 
throughout the calculation of the first layer's gradient equation. The gradient for the 
first layer weight can be expressed as 



9ojk = ~ m g 3 i-w 3 ) (1 - j - 3 ) 






(3.40) 



Otvujk c)w 0j k 

The threshold 6 2 is not a function of the input layer's weights and is eliminated when 
its partial derivative is taken with respect to the input weight The other terms 

under evaluation by the partial derivative (i.e., x 2p ) are all. however, a function ol 
the input weight u' 0 ^. The partial derivative can be moved inside the summation 
resulting in 

9u,k = ( 1 - .T 3 ) w 2p~7— $2}- ( ; k‘l 1 ) 

p OWnjk 

Shifting the summation to the far left and evaluating the partial derivative using 



equation 3.22 yields 

9ojk = Y^9zw 3 (1 - .r 3 ) w 2p (-J’2 P ) (1 - J‘2 v ) qkZiq'j ■ (3.42) 

The 6 ip term in equation 3.42 can be eliminated since it is not a function of 
The remaining terms can then be rearranged to produce 



. ^ a « 

= 2^ 9 3 t( M 1 ~ • ( s)( J '2^>) ^2p ( 1 - X’ip) } L U ' 



(3.43) 



33 



The value g 2p can be substituted for the first part of equation 3.43 using equation 
3.30. Also, the output of the k tb neuron of the input layer, x lk , is a function of the 
input weight w 0jk . As a result, evaluating the partial derivative using equation 3.22 
results in the equation 



90jk - Y,9*P W 2p “ X 2p) W lkp (~Xlk) (1 - X lk ) - Y WOrkXOrj ■ (3.44) 

Evaluating the partial derivative in equation 3.44 with respect to w 0;)k we find that 
only the j th term of the summation is nonzero. Rearranging the terms yields 



Finally, we can replace the first four terms of equation 3.45 with the value g lkp using 
equation 3.37. This results in the equation for the gradients of the weights of the 
input layer of the network 



Using the same reasoning used to derive equations 3.32 and 3.39 we can express the 
equation for the gradient of the input layer neuron thresholds as 



Derivation of the equations for the gradients associated with the weights and thresh- 
olds of the neural network is now complete. What we have found is that the gradients 
for any particular layer of the network can be expressed as a function of the given 
layer’s weights, thresholds, inputs, outputs, and the following layer's gradients. It is 
not necessary to begin at the output of the network and use the network output error 
e(t) to calculate the gradient for a particular weight or threshold which is several lay- 
ers back in the network. The above expressions for the gradient do. however, dictate 
that the gradient calculations begin at the output of the network and the gradients 
be propagated back through the network. 



d 



90jk — 5Z^ 2 P U ’ 2 P (1 ^2/)) ^lk U'lkp (1 •klk)'kQj- 



(3.45) 
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5. Fibonacci Line Search Parameters 



Several parameters associated with the Fibonacci line search methods must 
be specified before the conjugate gradient algorithm described in this chapter can be 
applied. These parameters are: 

• The initial size of the interval of uncertainty 

• The number of iterations that the line search should perform. 

The Fibonacci line search attempts to find the best stepsize (a*,) in which to step 
along the error function surface towards the global minimum in a direction defined 
by the direction vector (d*). The initial interval of uncertainty is the interval over 
which the algorithm will search for the optimal stepsize (a*.). The initial interval, 
therefore, establishes the minimum and maximum stepsize values. Our goal is to find 
the optimal set of weights and thresholds by moving steadily down the error function 
surface towards the global minimum. The lower bound of the interval, or minimum 
stepsize value, is therefore zero since a negative value would move the algorithm up the 
error function surface in a direction opposite the direction vector (d/,. ). Selection of an 
upper bound for the interval entails a number of tradeoffs. A larger maximum value 
would allow the algorithm to search over a greater interval for the optimal stepsize 
fa t ). This could allow the conjugate gradient algorithm to converge to the global 
minimum more quickly by enabling it to step farther down the error function surface 
at each iteration of the algorithm. It could also possibly provide more protection 
against being trapped in a local minimum by allowing the line search algorithm to 
search beyond the confines of a local minimum. A larger interval, however, requires 
that a greater number of iterations be performed to reduce the interval of uncertainty 
to the required degree. This final interval of uncertainty must be small so that 
midpoint of the interval is reasonably close to the optimal stepsize value. It is this 
midpoint that is the stepsize value cu- that will be used by the conjugate gradient 



algorithm to update the weights and thresholds of the neural network. A larger final 
interval of uncertainty increases the chances of a less than optimal choice for the final 
stepsize. A balance must therefore be struck between the size of the initial interval of 
uncertainty, the size of the final interval of uncertainty, and the number of iterations 
to be performed. 

Initial investigations were performed to determine the range of stepsize 
values that were typical for various neural network applications. It was found that 
the stepsize (a*) generally did not exceed a value of 10.0 and was typically less than 
1.0. An initial interval of uncertainty of 10.0 was therefore used throughout remainder 
of the thesis research. 

In the course of determining the initial interval of uncertainty it was found 
that the line search method would occasionally yield a final step size value {a k ) which 
produced an error function value much greater than the previous iteration's value. It 
was determined that this problem was a result of the error function surface not being 
unimodal in the direction (d^. ) along which the algorithm searched for the minimum. 
If this second minimum was closer to one of the two evaluation points (A;. and |4-) 
than the true minimum, as shown in figure 3.5, then the algorithm would converge to 
this second minimum. This would result in an error function value larger than when 
the line search algorithm started. To remedy this problem, the initial interval of 
uncertainty was shifted to the left so that the first point evaluated was for A 0 = 0. If 
the error function for the final stepsize value (a k ) was greater than the error function 
value with a stepsize of zero, then a stepsize of zero was returned as the final stepsize 
value (a k ). This had the effect of resetting the conjugate gradient algorithm. A 
stepsize of zero caused the algorithm to retain the same weights and thresholds for 
the next iteration of the algorithm. As a result, the gradient (gc-+i) at the next 
iteration was identical to the previous gradient (g k ) and the two successive identical 
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gradients would produce a deflection coefficient (/3k) equal to zero. This would reset, 
the direction vector (d fe ) to the value of the present gradient (g*) rather than the 
weighted sum of previous gradients. This had the effect of reinitializing the conjugate 
gradient method, but at a new starting point (h^) on the error function surface. 




Figure 3.5: Line profile of the error function surface 

Having fixed the initial interval of uncertainty, the number of iterations of 
the line search algorithm performed during each iteration of the conjugate gradient 
method was varied to determine an optimal number. Using sixteen iterations, the 
conjugate gradient algorithm was able to consistently reduce the value of the error 
function. The value of the error function did not consistently drop when fewer than 
sixteen iterations were used. Using equation 3.9 this resulted in a final interval of 
uncertainty of 0.00G2G. 
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C. COMPUTER PROGRAM IMPLEMENTATION 



1. Conjugate Gradient Algorithm 

The conjugate gradient algorithm was implemented for a multiple input, 
single output neural network using the C programming language. A flow chart show- 
ing the basic functions that are performed by the program is shown in figure 3.6. The 
user is prompted at the start of the program for the number of neurons in each stage 
of the neural network, the number of iterations of the conjugate gradient algorithm 
that should be performed, and the name of the input file that contains the training 
data that the algorithm will use to adapt the weights and thresholds of the network. 
The number of neurons allowed in the network is limited to a total of 50 and the 
number of weights connecting the neurons is limited to 500. This maximum number 
of neurons and weights was more than large enough for the various problems to which 
the program was applied. The training data file consists of columns of data in which 
each column is associated with an input to the neural network except for the the last 
column. The last column is the desired output of the neural network. Each row is a 
separate training data set. 1'pon completion of the program three liles are produced. 
The first is a file that contains the final results. The first column of the file is the 
desired value and the second column is the value that the neural network produced 
using the final weights and thresholds of the network. If the algorithm has performed 
as expected and reduced the error function to a small value, then the two columns of 
data should be nearly identical. The second output file produced contains the final 
weights and thresholds of the network. This file can then be used by any other pro- 
gram which simulates the operation of a neural network with the same configuration 
of neurons. The final file is produced only if the neural network has two inputs. The 
file consists of a 21 x 21 matrix of neural network output values that were produced 
by applying a sequence of twenty-one evenly spaced values between 0.0 and 1.0 to 
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each of the two inputs. The resulting file can be used to produce a three dimensional 
mesh of the output surface of the neural network. Examples of the input screen, 
output screen, and both the input and output files are contained in Appendix A. A 
copy of the C program source code is contained in Appendix B. 

2. Backpropagation Algorithm 

In order to evaluate the conjugate gradient algorithm’s performance, the 
backpropagation method was also implemented. The basic flow chart for the back- 
propagation method is shown in figure 3.7. Because of the similarity between the con- 
jugate gradient method and the backpropagation methods, this required only a few 
changes to the program that implemented the conjugate gradient algorithm. These 
changes consisted of 

• Replacing the stepsize value (o/,.) calculated by the Fibonacci line search with a 
user specified constant referred to as the learning rate by the backpropagation 
method. 

• Replacing the deflection coefficient (3k) which is calculated for every iteration 
of the algorithm with a user specified constant referred to as the momentum 
factor by the backpropagation method. 

• Updating the weights and thresholds of the neural network after the application 
of each training data set rather than upon completion of a complete- pass through 
the entire training data file. 

The input and output tiles remain the same as those for the conjugate gradient version 
of the program. 

The following chapter compares the performance of the conjugate gradient 
and backpropagation algorithms and also presents the results of several neural network 
applications. 
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Figure 3.6: Conjugate gradient algorithm flowchart 
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Figure 3.7: Backpropagation algorithm flowchart 
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IV. RESULTS 



In this chapter, the results of the research conducted on neural networks using 
the conjugate gradient method are presented. The chapter is divided into two parts. 
The first concerns the performance of the conjugate gradient algorithm compared 
to that of the backpropagation method. The second provides several examples of 
neural network applications. Where possible, the performance of the neural network 
is compared to its linear counterpart. 

A. CONJUGATE GRADIENT ALGORITHM PERFORMANCE 
1. Performance Measures 

The rationale for implementing the conjugate gradient algorithm was to 
develop an alternative to the backpropagation method that would converge more 
quickly to the optimal set of weights and thresholds for a given problem. The error 
function (E) is a measure of whether the weights and thresholds of a neural network 
are optimum when applied to a particular problem. The smaller the error function 
value, the more nearly optimum the weights and thresholds are. Both algorithms 
reduce the value of the error function by iteratively adapting the weights and thresh- 
olds of the neural network. The rate at which the backpropagation and conjugate 
gradient algorithms converge to the optimal set of weights and thresholds can be 
measured using several methods. The simplest approach would be to determine the 
number of iterations each algorithm requires to reduce the value of the error function 
to a prescribed level. The number of iterations for each algorithm would then be 
compared and the algorithm requiring fewer iterations would be considered to con- 
verge more quickly. This approach does not. however, take into account the greater 
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computational complexity of the conjugate gradient method. A more accurate mea- 
sure of performance for the purposes of comparison is the number of multiplications 
performed by each algorithm. This measure better reflects the relative computational 
requirements of the two algorithms. The number of multiplications performed by each 
of the methods over one iteration is fixed. We can therefore calculate a multiplication 
ratio of the two methods and then use this ratio in conjunction with the number of 
iterations to compare their relative performance. 

2. Calculation of the Multiplication Ratio 

The number of multiplications performed by both the backpropagation 
method and the conjugate gradient method over one iteration is a function of several 
variables. These include the number of neurons and weights in the network, the size 
of the training data file used to train the network, and the number of iterations per- 
formed by the Fibonacci line search method. Tables 4.1 and 4.2 show the number of 
multiplications required by various functions of the conjugate gradient and backprop- 
agation method, respectively. The tables also show the total number of times each 
function is performed during a single iteration of the algorithm. The variable 1 is the 
number of training data sets used to train the network, the variable P is the number 
of weights and thresholds in the network, and R is the number of neurons in t ln- 
network. Table 4.1 figures reflect that the step size (a k ) is calculated using sixteen 
iterations of the Fibonacci line search algorithm. The total number of multiplications 
(M) performed by each of the algorithms is therefore 

Mr r, = 7T20P + 37/?+ 17) + 21(P + R) + do (4.1) 

for the conjugate gradient method and 

X1 BP = T(oP + oR) (4.2) 
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TABLE 4.1: MULTIPLICATIONS 

METHOD 



CONJUGATE GRADIENT 



Function 


Times 

Performed 


Number of 
multiplies 


Calculate network output 


1ST 


P + 2R 


Calculate gradient vector 


T 


IP 4- R 


Calculate deflection coefficient 


1 


IP + 2R 


Update direction vector 


1 


P + R 


Calculate step distance 


1 


35 


Update weight vector 


IS 


P + R 


Calculate error sum 


17 


M 



TABLE 4.2: MULTIPLICATIONS - BACKPROPAGATION METHOD 



Function 


Times 

Performed 


Number of 
multiplies 


Calculate network output 


T 


P + 2R 


Calculate gradient vector 


T 


2 P + R 


Update direction vector 


T 


P+R 


Update weight vector 


T 


P+R 
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for the backpropagation method. We can then derive the multiplication ratio by 
dividing Mcg by Mbp to obtain 



Mcg 

^bp 



T{'20P + 37 R + 17) + 21 (P + R) + 35 
T(5P + 5 R) 



RATIO 

Equation 4.3 can then be factored into four terms as shown below 



(4.3) 



RATIO = 4 + 



17(72 + 1) 21 35 

5 {PAR) + 5 T + 5 T{P + RY 



(4.4) 



For the purposes of approximation, the last two terms of equation 4.4 can be elim- 
inated since the number of training data sets used to train the neural network is 
typically large. As the number of neurons in a network is increased, the number of 
connections or weights in the network increases at a much greater rate. This happens 
because each neuron in a given layer is connected to every neuron in the next layer 
of the network. As a result, the second term of equation 4.3 steadily decreases as 
the number of neurons is increased. The lower bound on the multiplication ratio is 
therefore approximately four and the upper bound can be set at approximately five 
lor networks having more than just a few neurons. 

3. Performance Results 

The performance of the conjugate gradient method was compared to the 
performance of the backpropagation method using two different training problems. 
The first consisted of training the neural network to produce a binary output of either 
one or zero depending on the inputs to the network. The second problem involved 
training the neural network to produce a specific value within the range of zero to 
one for a given set of inputs to the network. 

A plot of the normalized value error function versus the number of itera- 
tions performed for the binary problem is pictured in Figure 4.1 for the backpropa- 
gation algorithm and in Figure 4.2 for the conjugate gradient algorithm. Note the 



difference in the horizontal scale of the two figures. The error function steadily de- 
creased for the conjugate gradient method while the error function actually increased 
for approximately the first 100 iterations of the backpropagation algorithm. Also 
note that the error function’s rate of change was much more even for the conjugate 
gradient algorithm than for the backpropagation method. 

In order to compare the relative performance of the two algorithms, the 
multiplication ratio’s upper bound of five was used. Pictured in Figure 4.3 is a 
comparison of the two algorithms’ convergence rates with respect to the approximate 
number of multiplications performed by each algorithm. As can be seen, for the binary 
case, the conjugate gradient method consistently outperformed the backpropagation 
method for any given number of multiplications performed. 

The results were even more apparent for the continuous output problem. 
The backpropagation method was unable to significantly reduce the error function's 
value for the first 500 iterations of the algorithm as is shown in Figure 4.4. The conju- 
gate gradient method, however, steadily reduced the value of the error function value 
after each iteration of the algorithm (Figure 4.5). Comparison of the convergence 
rates of the two methods with respect to the number of multiplications required in 
each case is shown in Figure 4.6. For any given number of multiplications the conju- 
gate gradient method greatly outperformed the backpropagation method. 

The conclusion from the two examples above is that the conjugate gradient 
method performs as well or better than the backpropagation method with respect to 
both the number of iterations and the number of multiplications required to reduce 
the error function to a desired level. The conjugate gradient method therefore satisfies 
one goal of this thesis which was to develop an alternative to the backpropagation 
method that would converge more quickly to the optimal set of weights and thresholds 
for any given problem. 
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Figure 4.1: Binary problem - backpropagation 




Figure 4.2: Biliary problem - conjugate gradient 
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error sum 




multiplications (normalized) 



Figure 4.3: Binary problem - comparison 




Figure 4.4: Continuous problem - backpropagation 




Figure 4.5: Continuous problem - conjugate gradient. 
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Figure 4.6: Continuous problem - comparison 
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B. NEURAL NETWORK APPLICATION RESULTS 



Several simple applications were chosen to evaluate the performance of the con- 
jugate gradient method vis-a-vis the backpropagation method. These applications 
were also used to develop a better understanding of the potential signal processing 
applications for the neural network. When possible, the neural network’s performance 
was compared to its linear counterpart. 

1. A Classification Problem 

The goal for this problem was to train a neural network to differentiate 
between two classes of inputs. The two classes of inputs consisted of points which 
fell either inside or outside of a circle with a diameter of 0.5 centered within in a 
unit square as shown in Figure 4.7. This classification problem, although relatively 
simple, is representative of one of the primary tasks to which neural networks have 
been applied pattern recognition and classification [Ref. l:pp. 00 07]. 

The points used to construct the training data file were evenly spaced 0.1 
apart from zero to one for both the A 0 and Ah coordinates as shown in Figure 4.7. 
This produced a total of 121 points over the unit square. The training data file 
was composed of 121 data sets, each set consisting of the coordinates for one of the 
training points and a value representing the desired class to which the point belonged. 
The desired value foi a point falling inside the circle was a one. The desired value 
for a point falling outside the circle was a zero. The conjugate gradient algorithm 
was used to train a neural network which had two inputs, eight first layer neurons, 
four second layer neurons, and one output neuron (a 2—8—4— 1 configuration). After 
100 iterations of the algorithm, the total squared error summed over the entire 121 
training data sets was 0.26 x 10 -2 . The resulting output of the neural network as a 
function of its inputs is pictured in Figure 4.S. 
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Figure 4.7: Training data for the classification problem 



The neural network produced output values ranging from 1.56 x 10 -9 to 
1.0 and was able to properly identify the class to which each of the training data 
points belonged. The contour plot of the neural network output for a single contour 
value of 0.5 is shown in Figure 4.9. The plot clearly shows that the conjugate gradient 
algorithm was able to calculate a set of weights and thresholds for the neural network 
that very closely approximates the desired result. A circular decision region was 
formed that allowed the neural network to differentiate between points falling inside 
the circle and points falling outside the circle. This is because a neural network, due 
to its nonlinearity, has the ability to form arbitrarily complex decision regions. 

This simple example clearly demonstrates the ability of a neural network 
to produce a nonlinear mapping of a set of analog inputs to a single binary output 
value. In this case, this nonlinear mapping was used to produce the two decision 
regions pictured in Figure 4.9. For other applications, the formation of decision 
regions may not be called for. Rather, the output of the network may have to be 
continuously variable. 

2. Nonlinear Time Series Prediction 

The previous problem required the neural network to produce only a binary 
output of one or zero. The second application was selected so that the conjugate 
gradient algorithm’s performance could be evaluated for the case of a continuously 
variable range of desired output values. This type of application falls into a second 
class of tasks for which the neural network can be applied nonlinear mapping of a 
set of analog inputs to an analog output value [Ref. l:p. 67]. It was decided to apply 
the neural network to the problem of one-step prediction of a nonlinear time series. 
One-step prediction is a fairly common application in digital signal processing. A 
nonlinear time series was used since one-step prediction for a linear time series could 
easily be satisfied using a linear filter rather than a neural network. The method 
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Figure 4.8: Neural network output versus input 




Figure 4.9: Neural network output contour plot 
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used to perform the prediction is similar to that used by a linear predictor. Tlie 
next value in the series is predicted using the previous values of the series. The basic 
configuration is pictured in Figure 4.10. 




Figure 4.10: Time series predictor 

For a linear predictor the output of the predictor is merely a weighted sum 
of a given number of previous values of the series. The neural network, however, can 
produce an output which is a nonlinear function of a given number of previous values. 
The nonlinear time series used to train and evaluate the conjugate gradient algorithm 
was produced using 

•i'„+i =4fl.T„(l -x H ). (4.5) 

This equation is referred to as the classic logistic or Feigenbaum map and has been 
studied quite extensively because its simplicity and its application to chaos theory. 
This iterated equation (equation 4.5) produces an ergodic. chaotic time series that is 
bounded and quasi-periodic [Ref. 1 2 : p . 10]. A training sequence of 100 samples was 
generated using equation 4.5 with the variable B equal to 1.0. This sequence was 
then used to adaptively calculate the optimal coefficients for a linear second order 
prediction filter using a recursive least squares method. The linear predictor’s results 



are pictured in Figure 4.11. Only the first fifty samples of the sequence were plotted 
so that the two curves on the graph could be better differentiated. It is obvious 
from Figure 4.11 that the linear predictor was unable to accurately predict the next 
value in the nonlinear series using the two previous values of the series. When the 
difference between the the actual and predicted signals is plotted one can see that the 
magnitude of the error is almost as great as the magnitude of the original signal (see 
Figure 4.12). As was expected, the linear predictor performs poorly for a nonlinear 
problem. 

The same training sequence was then used by the conjugate gradient al- 
gorithm to train a neural network with a 2-4-2- 1 configuration. The network was 
trained to predict the next value of the series based on the two previous values. Af- 
ter 100 iterations, the sum of the squared errors over the 100 training data sets was 
7.25 x 10~ 3 . This would equate to an average standard deviation from the actual sig- 
nal of approximately S.51 x 10 -3 . The neural network's results are pictured in Figure 
4.13. It is apparent that the neural network performed much better than the linear 
predictor. The prediction error for the neural network is pictured in Figure 4.14. The 
magnitude of the neural network's prediction error is much smaller than that for the 
linear predictor. This error could also be reduced even further if additional iterations 
of the conjugate gradient were performed. 

This example demonstrates that a neural network is quite capable of per- 
forming nonlinear mapping of a set of analog inputs to an analog output. The neural 
network can also produce more accurate results than the linear approach when the 
problem to be solved is nonlinear. It must be recognized, however, that although 
the neural network produces more accurate results, it is much more computationally 
complex than the linear approach to the problem. 
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3. Channel Equalization 



One final example will serve to demonstrate the potential applications for 
the neural network. The idea of using a neural network to perform channel equaliza- 
tion for a nonminimum phase transmission channel was borrowed from Gibson, Siu, 
and Cowan [Ref. 15]. The experimental results indicate that a neural network could 
potentially provide superior performance to its linear counterpart when the channel 
over which the digital data is transmitted is nonminimum phase. 

a. Transmission Channel Model and Equalizer model 

When digital data is transmitted, it frequently becomes distorted by 
the channel over which it travels. This distortion can frequently be modeled using 
a linear time invariant (LT1) system [Ref. b:p. 420]. The channel model, shown in 
Figure 4.15. consists of the transfer function H{z) and a channel noise term //,. The 



channel 




Figure 4.15: Channel model and equalizer 

transfer function of the channel is defined by a finite impulse response (FIR) equation 

// ( c ) = a u + tqc 1 + ■ • • + ! . (4.0) 

The channel noise term n, is typically assumed to be zero mean, additive white 
gaussian noise. 1 he purpose of a channel equalizer also shown in Figure 4.15 is to 
reverse the distorting effects of the channel and to recover the original signal (a-,) using 
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m samples of the received signal, y,, 1 . If we assume, for a moment, 

that the noise term (n,) is zero, then the received signal y, is merely a weighted sum 
of the present and past values of the original signal x - ,. This can be expressed as 

k 

y, = J2 a i x i-i ( 4 - 7 ) 

j=o 

where aj are the k+ 1 coefficients associated with the channel transfer function H(z). 
For a binary signal(±l), therefore, the received signal y, can assume only one of 2 k 
possible values. If we then try to estimate the original signal x , using an m sanrple 
vector [y t , we can only form a fixed number of permutations of the 

received signal vector. Each received signal vector [y„ y,_i, ■ ■ ■ , yi-m-ttl belongs to 
either the set of vectors corresponding to a transmitted binary one ( + ] ) or the set of 
vectors corresponding to a transmitted binary zero ( — 1). Hie channel equalizer pro- 
duces an estimate of the transmitted signal .r, by determining which set the received 
signal vector belongs to. It has been shown that a linear transversal equalizer can 
perform such an operation if the channel transfer function H(z) is minimum phase 
[Ref. 15:p. 1184]. If the channel transfer function is not minimum phase, then the two 
received vector sets are not linearly separable and a linear equalizer cannot accurately 

estimate x, based on the received data vector set [y t .y t ^i y,_„ +1 ]. If a delay, d. 

is introduced in the calculation of .i\. such that the at the d 1 ' iteration the equalizer 
estimates the original signal then accurate estimation of the original signal can 
be achieved [Ref. 15:p. 1184]. This value for d however, may not be known, or may 
vary with time. The result is that a linear transversal equalizer, even with a delay, 
may not be able to satisfactorily equalize a nonminimum phase channel, 
b. A Nonminimum Phase Channel Equalizer 

The ability of a neural network to form arbitrary decision regions, 
demonstrated in Chapter II. could possibly remedy this problem. To investigate this 



60 



concept, the first order nonminimum phase transfer function {H(z) = 0.5 + ~ -1 ) was 
used to evaluate the performance of both a neural network and a linear transversal 
equalizer. The possible values for «/, using this transfer function are: #4.5, +0.5, —0.5, 
and —1.5. A two input neural network and two input linear transversal equalizer were 
used since the channel's transfer function was only first order, and this allowed a 
graphical analysis of the problem. The eight possible combinations of y t and j/,_i are 
shown in Figure 4.16. The symbol x indicates that the original signal x ,■ had a value 
of —1 and the symbol o indicates that x, was equal to +1. Notice that the symbols 
are intermixed such that no single line can be drawn that will completely separate the 
two classes of symbols. This is what makes the nonminimum phase case intractable 
for the linear transversal equalizer. If the noise term. n,. is now incorporated into 
the problem, the result is as shown in Figure 4.17 for a signal-to-noise ratio (SNR) 
of 10 dB. The number of possible values for y, becomes infinite, but the points are 
distributed about the original eight points shown in Figure 4.16. The coefficients 
for a first order linear transversal equalizer were calculated by applying a recursive 
least squares (RLSj algorithm to the set of 500 consecutive values of y, pictured in 
Figure 4.17. The values for y, were generated by using a random sequence of +1 and 
— 1 for a,, applying this binary sequence to the transfer function given above, and 
adding a normally distributed noise term with a standard deviation equivalent to a 
signal-to-noise ratio of 10 dB. The linear transversal equalizer's two decision regions 
are pictured in Figure 4.16. The region that is shaded with dots is the area for which 
the linear transversal equalizer produced an estimate of +1 for x, and the unshaded 
region where the equalizer produced an estimate of —1 for ay Note that the best that 
the linear equalizer could do was to define two decision regions such that three of the 
four possible points fell within the proper region. The same 500 value data set was 
then used to train a neural network having a 2 6 4 1 configuration. The decision 
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regions formed by the neural network after 100 iterations of the conjugate gradient 
algorithm are pictured in Figure 4.19. The neural network, because of its ability 
to account for the nonlinearities, was able to form two separate decision regions for 
each of the two possible values for x,\ The four decision regions properly encompass 
the eight possible points associated with y* and yi-\. As a result, the total number 
of errors produced over the 500 value training set dropped from 151 for the linear 
equalizer to 65 for the neural network. The neural network’s ability to form more 
complex decision regions allowed it to more accurately perform equalization when the 
transfer function was nonminimum phase. 

c. A Nonminimum Phase Channel Equalizer Using a Delay 

It was stated earlier that introduction of a delay d could allow the 
linear equalizer to more accurately perform its equalization function. Pictured in 
Figure 4.20 are the eight possible points associated with y, and y,_! for a delay of 
one sample (i.e., the estimate of _ i based on the samples y, and y,-\). The two 
classes of points are no longer intermixed as they were for the case of no delay. A set 
of coefficients for the linear equalizer can therefore be calculated that will properly 
separate the two sets of points. With noise added, however, the sets of points begin to 
intermix as shown in Figure 4.21 for a signal-to-noise ratio of 10 dB. The separation 
of the two classes becomes more difficult particularly for the linear equalizer which 
can only use a single line to define the decision boundary. The coefficients for the 
linear equalizer were again calculated using the RLS algorithm and the 500 values 
for y t pictured in Figure 4.21. The resulting decision regions are shown in Figure 
4.22. Comparison of the two decision regions with the original training data (Figure 
4.21) indicates that the linear equalizer was unable to define a single line that could 
separate all the points into their proper regions. The linear equalizer produced a 
total of 19 errors over the 500 values of the training data set. The same training data 
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Figure 4.19: Neural network decision regions 
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set was then used to train a neural network with a 2-6-4-1 configuration using the 
conjugate gradient algorithm. After twenty iterations, the neural network produced 
the two decision regions pictured in Figure 4.23. The boundary between the two 
decision regions is no longer a straight line but is shaped to take into account the 
distribution of points caused by the introduction of noise. The neural network only 
produced a total of 3 errors over the 500 value training set. 
d. A Performance Comparison 

The results from the two above examples would tend to indicate that 
a neural network can produce superior results to the linear equalizer both when 
a delay is introduced and when a delay is not introduced. In order to confirm this 
result, the performance of both the linear transversal equalizer and the neural network 
were evaluated for various signal-to-noise ratios. Ihe measure of performance for 
the test was the average bit error probability. The four signal to noise ratios: 5.0 
dB. 10 dB. 20 dB. and 25 dB were used to generate four different sets of training 
sequences. Each sequence was generated using a different signal-to-noise ratio. Both 
the linear equalizer and the neural network were then trained using these four 500 
value sequences for y,. After calculating the coefficients for the linear equalizer and 
the weights and thresholds for the neural network the bit erroi performance oi each 
type equalizer was calculated by passing the same 100.000 bit sequence through each 
equalizer and counting the number of times the equalizer produced an error. The 
results for the case where no delay was used is shown in Figure 4.24. As was expected, 
the bit error probability for the linear equalizer with no delay was extremely poor. The 
bit error probability for the neural network steadily dropped as the magnitude of the 
noise fell. The lowest of the three curves shown in Figure 4.24 reflects the performance 
of the neural network at the various signal to-noise ratios after having been trained 
using the 10 dB SNR training data set. Its performance is equal to or better than the 
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Figure 4.22: Linear equalizer (with delay) decision regions 
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Figure 4.23: Neural network (with delay) decision regions 
4h 



neural networks trained and evaluated for a specific SNR. This is because the lower 
SNR forced the conjugate gradient algorithm to produce a set of decision boundaries 
that were more nearly optimal. This result was even more apparent for the case when 
a delay was introduced in the equalization problem (Figure 4.25). The same method 
was used as described above, except that both the linear equalizer and neural network 
produced an estimate of rr,_], rather than ar,-, based on the received signals y, and 
r/j-i . Once again the neural network performed better than the linear equalizer and 
the neural network trained using 10 dB data performed the best. 

One final comparison can be made between the neural network and 
the linear transversal equalizer. This is a comparison of neural net work without delay 
versus the linear equalizer with delay. This comparison is shown in figure 4.20. Also 
shown is the neural network's performance with a delay. The neural network without 
delay did not perform as well as the linear equalizer for low signal to noise ratios. 
As the magnitude of the noise was reduced, however, the performance of the two 
approaches became comparable. The neural network with delay, however, was better 
than any of the approaches. 

e. Channel Equalizer Conclusions 

The performance of both a linear transversal equalizer and a neural 
network were evaluated with respect to their ability to accurately equalize a nonmin- 
imum phase digital data channel. It was found that a linear transversal equalizer was 
unable to accurately estimate the original signal because of the channel’s nonmini- 
mum phase characteristic. The neural network, because of its ability to form arbitrary 
boundaries, did not suffer from this problem. Introduction of a delay, allowed both 
the linear transversal equalizer and the neural network to improve their performance. 
Finally, a neural network using no delay showed a comparable performance to a linear 
transversal filter with a delay for high signal-to noise ratios. The ability of the neural 
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Figure 4.25: Equalizer performance (with delay) 



network to perform equalization without introduction of a delay could prove useful, 
particularly if the required delay is unknown or varies with time. 
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V. CONCLUSIONS AND 
RECOMMENDATIONS 



A. CONCLUSIONS 

The first objective of this thesis research was to develop an alternative to the 
backpropagation method for calculating the optimal set of weights and thresholds for a 
neural network. The results presented in Chapter IV demonstrated that the conjugate 
gradient algorithm developed for this thesis was more computationally efficient than 
the well known backpropagation method. 

The second objective of this research was to develop a better understanding of 
the relationship between the structure of a neural network and its ability to perform 
input- to-output mapping. A graphical approach was used to analyze the internal 
representations of the neural network. The results of this analysis were presented in 
Chapter II. 

The final object ive of this thesis research was to evaluate ihe performance of a 
neural network for several different signal processing applications. The first example 
presented demonstrated the ability of a neural network to perform classification. 1 he 
second example, nonlinear time series prediction, compared the performance of a 
neural network to its linear equivalent, and showed that the neural network produced 
superior results. The final example illustrated the performance differences between a 
neural network and a linear approach to nonminimum phase channel equalization. 

These applications demonstrated that the nonlinear properties of a neural net- 
work frequently allow the neural network to perform functions more effectively than 
its linear counterpart. This is particularly the true when the problem itself is nonlin- 
ear. It must be recognized, however, that there is a cost to this increased functionality. 
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Calculation of the proper weights and thresholds for a given problem is much more 
computationally complex. The computational complexity associated with the use of 
a neural network must therefore be balanced with the accuracy desired when decid- 
ing whether to use a neural network rather than a linear approach to solve a given 
problem. 



B. FUTURE RESEARCH 

In the course of this thesis research, several other areas w r ere identified that 
merit additional study. 

1. Transfer Function Selection 

The sigmoid function used for this research produced an output that ranged 
between 0 and 1. Other transfer functions could be investigated that produce a bipolar 
output. This could prove to be more useful for typical signal processing applications. 
One such transfer function that could be evaluated is the hyperbolic tangent function 

- t~ : 1 — t~ 2: 



tanh(r) 



(5.1) 



e 2 + e~ z 1 + t 2z 

This nonlinear function produces a value which ranges between ±1 and is continuously 
differentiable for all values of c. 

2. Neural Network Dynamic Range 

The performance of a neural network having a greater dynamic range could 
be investigated. The dynamic range of the neural network could be expanded by 
allowing adaptation of the output weight W 3 . It could also be accomplished by using 
a linear transfer function for the single neuron in output layer of the network. The 
output of the network would then be a linear combination of the weighted outputs 
from the second layer of network. This is approach taken by Lapedes and Far her in 
their research [Ref. 11], 



3. Internal Representations 

This thesis made no attempt to analyze the internal representations used 
by the neural network to produce the desired outputs for a given set of inputs. Re- 
search could be conducted to try to determine exactly what type of functions the 
individual neurons in the network perform. This could provide further insight into 
the relationship between the structure of a neural network and its ability to perform 
a particular task. 

4. Analysis of the Weights and Thresholds 

Research could be performed to determine if there is any analytical signif- 
icance to the final weight and threshold values for a neural network. 



APPENDIX A: PROGRAM OUTPUT SCREEN 
AND DATA FILES 



A. EXAMPLE OUTPUT SCREEN 

** Conjugate Gradient Algorithm ** 

What is the name of the training data file? circ.dat 

How many inputs to the neural network? 2 

How many 1st layer neurons? 4 

How many 2nd layer neurons? 2 

There will be only one 3rd layer neuron. 

How many passes thru the training data set? 2 

Initial Error sum: 40.2786 

Performing iteration number 1 
Beta value: 0 
Alpha value: 0.10958 
Error sum: 17.3579 

Performing iteration number 2 
Beta value: 0.00366825 
Alpha value: 3.79148 
Error sum: 17.3564 

Final error sum: 17.3557 

Where do you want the results stored? circ.res 
** Calculating final results ** 

Where do you want the final weight/theta values stored? circ.wgt 
** Storing final weight/theta values ** 

Where do you want the map matrix stored? circ.map 
** Calculating map of network ** 
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B. EXAMPLE INPUT DATA FILE 



4 . OOOOe-OOl 

4 . OOOOe-OOl 
4. OOOOe-OOl 
4. OOOOe-OOl 
4. OOOOe-OOl 
4. OOOOe-OOl 
4. OOOOe-OOl 
5. OOOOe-OOl 

5. OOOOe-OOl 
5. OOOOe-OOl 
5. OOOOe-OOl 
5. OOOOe-OOl 
5. OOOOe-OOl 
5. OOOOe-OOl 
5. OOOOe-OOl 
5. OOOOe-OOl 
5. OOOOe-OOl 
5. OOOOe-OOl 
6. OOOOe-OOl 
6. OOOOe-OOl 
6. OOOOe-OOl 
6. OOOOe-OOl 
6. OOOOe-OOl 
6. OOOOe-OOl 



4. OOOOe-OOl 
5. OOOOe-OOl 
6. OOOOe-OOl 
7. OOOOe-OOl 
8. OOOOe-OOl 
9. OOOOe-OOl 
1 .OOOOe+OOO 
0 . OOOOe+OOO 
1. OOOOe-OOl 
2. OOOOe-OOl 
3. OOOOe-OOl 
4. OOOOe-OOl 
5. OOOOe-OOl 
6. OOOOe-OOl 
7. OOOOe-OOl 
8. OOOOe-OOl 
9. OOOOe-OOl 
1 .OOOOe+OOO 
0. OOOOe+OOO 
1 .OOOOe-OOl 
2. OOOOe-OOl 
3. OOOOe-OOl 
4. OOOOe-OOl 
5. OOOOe-OOl 



1. OOOOe+OOO 
1. OOOOe+OOO 
1. OOOOe+OOO 
1. OOOOe+OOO 
0. OOOOe+OOO 
0. OOOOe+OOO 
0. OOOOe+OOO 
0. OOOOe+OOO 
0. OOOOe+OOO 
0. OOOOe+OOO 
1. OOOOe+OOO 
1. OOOOe+OOO 
1 .OOOOe+OOO 
1 .OOOOe+OOO 
1. OOOOe+OOO 
0. OOOOe+OOO 
0. OOOOe+OOO 
0. OOOOe+OOO 
0. OOOOe+OOO 
0. OOOOe+OOO 
0. OOOOe+OOO 
1. OOOOe+OOO 
1. OOOOe+OOO 
1. OOOOe+OOO 






Desired output 



C. EXAMPLE RESULTS OUTPUT DATA FILE 



1.000000e+000 
1.000000e+000 
1.000000e+000 
1.000000e+000 
0.000000e+000 
0 .000000e+000 
0.000000e+000 
0 . 000000e+000 
0.000000e+000 
0 . 000000e+000 
1 . 000000e+000 
1.000000e+000 
1.000000e+000 
1.000000e+000 
1.000000e+000 
0 . 000000e+000 
0 . 000000e+000 
0 . 000000e+000 
0 . 000000e+000 
0 . 000000e+000 
0.000000e+000 
1 .000000e+000 
1.000000e+000 
1.000000e+000 



Desired output 



. 733739e-001 
. 738492e-001 
.743229e-001 
.747937e-001 
.752599e-001 
.757203e-001 
.761736e-001 
.714368e-001 
. 718988e-001 
. 723659e-001 
. 728365e-001 
.733092e-001 
. 737825e-001 
. 742548e-001 
.747247e-001 
.751906e-001 
. 756512e-001 
. 761052e-001 
.713874e-001 
. 7 18452e-001 
. 723086e-001 
. 727760e-001 
. 732460e-001 
.737171e-001 



Actual output 



1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 



D. EXAMPLE FINAL WEIGHTS OUTPUT DATA FILE 



2 4 

1.023357 

-0.092301 

-0.061310 

-0.796225 

-0.618687 

0.513663 

-0.831033 

-0.279913 

0.262880 

-0.542745 

1.344298 

1.000000 



2 } Number of neurons in each layer 



0.621861 -0.194039 0.288292 

-0.595949 0.007433 -0.795105 



1st layer weights [u’i] 



-0.031015 0.059272 

-0.522201 
-0.103397 
0.325973 
0.906917 

0 . 139071 | 1st layer thresholds [$i] 
> 2nd layer weights [w 2 ] 

[ 2nd layer threshold [0 2 ] 

[Output weight [u’3] 



Input weights [u’o] 
Input thresholds [0 O ] 



APPENDIX B: PROGRAM SOURCE CODE 
LISTING 



#include <stdio.h> 

#include <stdlib.h> 

#include <math.h> 

#include <float.h> 

/********************************************************************/ 
/* This program calculates the weights and thresholds for a */ 

/* feedforward multilayer neural network using the conjugate */ 

/* gradient optimization method. */ 

/********************************************************************/ 
/********************************************************************/ 
/* FUNCTION DECLARATIONS */ 

/♦I*******************************************************************/ 

int get.inf o(char filename [] ,int num_node[]); 

int get_data(char filename[] .double ts_data [] , int num_ inputs ) ; 
int init_weights(double *weight_ptr , int num_node[]); 
int init_thetas(double *theta_ptr , int num_node[]); 
void adapt .network (double weight [] .double theta[],int num.nodeD, 
int num.weights , int num.theta, double data.array [] , 
int array_size,int max.iteration) ; 
double f ire_neurons(double *activity_ptr , double *weight_ptr, 
double *theta_ptr , int num_node[]); 
void calc.gradient (double activity [] .double weight[], 
double theta.gradient [] .double gradient [] , 
int num.node [] , int num.weights , int num.theta) ; 
double calc_beta(double old.gr adient [] .double old.theta.gradient [] , 
double new.gradient [] .double new.theta.gradient [] , 
int num. input s , int num.theta) ; 

void update_direction(double gradient [] .double direction [] .double beta, 
int num.intputs) ; 

void update_weights(double weight [] .double alpha, double direction[], 
int num.inputs); 

double calc_alpha(double weight [] .double direction [] .double theta [] , 
double theta.direction [] .double activity[], 
double data.array [] , int array.size , int num_node[], 
int num.weights ,int num.theta); 

void load.values (double * input _ptr, double *output_ptr , int total.num) ; 
int f ibon(int n) ; 



void write.result (double weight [] .double theta[],int num_node[], 
double ts_data[] , int set.size) ; 

void map.network (double weight [] .double theta[],int num.nodeG); 
void store.weights (double weight [] .double *theta_ptr , int num.nodeG); 

/********* ******************************************** ***************/ 
/* MAIN PROGRAM */ 

main() 

{ 

char filename [14] ; 

int max_iteration,num_node[5] .num.weights .set.size .num.theta; 
double ts_data[3000] .weight [400] ,theta[50] ; 

printf("\n ** Conjugate Gradient Algorithm ** \n"); 
max_iteration = get.inf o(f ilename .num.node) ; 
set.size = get_data(f ilename, ts.data, num_node[0] ) ; 
if (set.size == 0){ 
exit (0) ; 

> 

num_weights=init_weights(weight .num.node) ; 
num_theta=init_thetas (theta, num.node) ; 

adapt .network (weight .theta, num.node .num.weights ,num_theta , 
ts .data, set .size, max .iteration) ; 
write.result (weight .theta, num_node.ts.dat a, set .size) ; 
store.weights (weight .theta, num.node) ; 
if (num.node [0] == 2){ 

map.network (weight .theta, num.node) ; 

> 

exit(0) ; 

> 

/* FUNCTION GET.INFO */ 

int get.inf o(char f ilename [], int num_node[]) 

{ 

int max.iteration; 

printf("\n What is the name of the training data file? 11 ); 
f lushallO ; 
gets (filename) ; 

printf("\n How many inputs to the neural network? "); 

scanf ("V,2hd" ,&num_node[0] ) ; 

printf("\n How many 1st layer neurons? ") ; 

scanf ("7.2hd" ,&num_node [l] ) ; 



printf("\n How many 2nd layer neurons? "); 
scanf ( M '/.2hd" ,&num_node [2] ) ; 

printf("\n There will be only one 3rd layer neuron. "); 
num_node[3] = 1; 
num_node [4] = 1 ; 

printf ("\n\n How many passes thru the training data set? "); 
scanf ( M, /,4hd" ,&max_iteration) ; 
return(max_iteration) ; 

> 

**************************************************** ************/ 
/* FUNCTION GET.DATA */ 

int get_data(char filename[] .double ts_data[] , int num_inputs) 

{ 

FILE *stream; 
int i,num_read; 

num_read = 0; 

if ((stream = f open(&f ilename [0] , "r" ) ) != NULLH 
for (i=0 ; (i < 3000)&& 

(f scanf (stream, "'/.lg" ,ts_data + i)>0);i++) 

f close(stream) ; 
if ((i‘/,(num_inputs + l)) != 0){ 

printf ("\n\n ** Improper number of input data elements **"); 
num_read = 0; 

> 

else{ 

num.read = i/ (num_ input s+1 ) ; 

} 

} 

else{ 

printf ("\n\n ** Could not find the specified file **"); 

> 

return (num.read) ; 

} 

/********************************************************************/ 
/* FUNCTION INIT.WEIGHTS */ 

/************************** ************* **************************♦**/ 
int init .weights (double * weight _ptr , int num_node[]) 

{ 

#def ine MAX.VAL 16384.0 
int num.weights ,i; 

srand(l) ; 
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num. weights = 0; 
for (i=0;i<3;i++){ 

num.weights += num.node [i] *num_node [i+1] ; 

> 

for (i=0;i< (num.node [0] ♦num.node [1]) ;i++){ 

♦weight _ptr++ = (1.0 - (rand () /MAX _VAL) ) ; 

> 

for ( i=0;i< (num.node [l] ♦num.node [2]) ;i++){ 

♦weight _ptr++ = (1.0 - (rand()/MAX_VAL) ) ; 

> 

for (i=0; i< (num.node [2] ♦num.node [3] ) ;i++){ 

♦weight _ptr++ = (1.0 - (rand() /MAX.VAL) ) ; 

> 

♦weight _ptr = 1.0; 
num.weights += 1; 
return(num.weights) ; 

> 

/ + ***»*** + **. ********** ******* ******* **************************** 

/* FUNCTION INIT.THETAS 

int init.thetas (double ♦theta.ptr.int num_node[]) 

int num_theta,i; 

num.theta = num.node [l]+num_node [2] +num_node [3] ; 
for (i=0 ; i<num_theta ; i++){ 

*theta_ptr++ = 0.0; 

> 

return(num.theta) ; 

> 

/♦ FUNCTION ADAPT.NETWORK ♦/ 

void adapt .network (double weight [] .double theta[],int num.nodeD, 
int num.weights , int num.theta .double data.array [] , 
int array.size.int max.iteration) 

int iteration, i.j .set.num; 

double activity[50] .gradient [400] .direct ion [400] ,gradient_sum[400] ; 
double actual.output .desired.output .alpha, beta, old.gradient.mag ; 
double theta.gradient [50] ,theta_sum[50] ,theta_direction[50] ; 
double old_gradient_sum[50] ,old_theta_sum[50] .error .errorsum; 
double ♦array.ptr; 

for (iteration=0 ;iteration<max_iteration; iteration++){ 



*/ 

***♦/ 
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for (i=0 ; i<num_weights ; i++){ 
gradient. sum [i] = 0.0; 

> 

for (i=0;i<num.theta;i++){ 
theta_sum[i] = 0.0; 

> 

errorsum=0 . 0 ; 

array _ptr = data.array; 

for (set_num=0 ;set_num<array .size ; set _num++){ 
for (i=0; i<num_node[0] ; i++H 
activity [i] = *array_ptr++; 

> 

desired.output = *array_ptr++ ; 

actual_output=fire_neurons(activity .weight, theta, num.node) ; 
error = actual.output - desired.output; 
error *= error; 

gradient [num_weights-l] = (actual.output - desired.output)* 
actual.output ; 

calc.gradientCactivity .weight .theta.gradient .gradient .num.node , 
num.weights .num.theta) ; 

for (i=0; i< (num.weights- 1) ;i++){ 
gradient_sum[i] += gradient [i] ; 

> 

for (i=0; i<num_theta; i++){ 
theta_sum[i] += theta.gradient [i] ; 

> 

errorsum += error; 

> 

printf(" Error sum: '/,lg \n" , errorsum) ; 
if (iteration == 0){ 
beta = 0.0; 

> 

else{ 

beta = calc_beta(old_gradient_sum,old_theta_sum,gradient_sum, 
theta.sum, (num.weights- 1) .num.theta) ; 

> 

for (j=0; j<(num_weights-l) ; j++){ 

old_gradient_sum[j] = gradient .sum [j] ; 

> 

for (j=0; j<num_theta; j++){ 

old_theta_sum[j] = theta_sum[j] ; 

> 

printf("\n Performing iteration number */,d \n" , (iteration+1) ) ; 
printfC Beta value: ’/,lg \n",beta); 

update_direction(gradient_sum, direction, beta, (num_weights-l) ) ; 



updat e.dir ect ion (theta.sum.theta.direct ion, bet a, num.theta) ; 
alpha=calc .alpha (weight .direction .theta, theta .direct ion .activity , 
data.array , array _s ize ,num_node,num_ weights , 
num.theta) ; 

printfC" Alpha value: */,lg \n", alpha); 

update.weight s (weight , alpha , direction , (num.weights- 1 ) ) ; 
updat e .weights (theta, alpha, theta .direct ion, num.theta) ; 

> 

errorsum = 0.0; 
array _ptr = data.array; 

for (set_num=0 ; set _num< array .size ; set _num++){ 
for (i=0;i<num.node[0] ;i++){ 
activity [i] = *array_ptr++ ; 

> 

desired.output = *array_ptr++; 

actual .output =f ire .neurons (activity, weight ,theta,num_node) ; 
error = actual.output - desired.output; 
error *= error; 
errorsum += error; 

> 

printf("\n Final error sum: */,lg \n" , errorsum) ; 
return; 

> 

/« FUNCTION FI RE .NEURONS */ 

double f ire_neurons(double *activity_ptr .double *weight_ptr, 
double *theta_ptr , int num_node[]) 

{ 

int layer _num , neuron.num , j ; 

doub let emp , * input _pt r , * output _pt r ; 

input _ptr = activity.ptr ; 

output _ptr = activity.ptr + num.node [0] ; 

/* Feed input forward thru each layer of the network */ 
for (layer _num=0; layer _num<3 ;layer_num++){ 

for (neuron_num=0; neuron.num < num.node [layer _num+l] ;neuron_num++| 
temp = 0.0; 

for (j=0;j < num.node [layer _num] ;j++){ 
temp -= (*weight_ptr++)* (input _ptr[j] ) ; 

> 

temp += *theta_ptr++ ; 

•output _ptr++ = 1 .0/(1 . 0+exp(temp) ) ; 

> 



input _ptr += num_node [layer.num] ; 

} 

temp = (*input_ptr) * (*weight_ptr) ; 
return(temp) ; 

> 

/* FUNCTION CALC.GRADIENT *1 

/********************************************************************/ 
void calc.gradient (double activity [] .double weight[], 
double theta.gradient [] .double gradient [], 
int num.node [] , int num_ weights, int num.theta) 

int layer.num, i , j .of f set ; 

double *weight_ptr , *gradient_ptr , *result_gradient_ptr ; 
double *output_acty _ptr , * input _acty_ptr .temp , *theta_ptr ; 

weight_ptr = ^weight [num_ weights- 1] ; 
gradient.ptr = &gradient[num_weights- 1] ; 
result_gradient_ptr = gradient _ptr - 1; 

output_acty_ptr = ^activity [0] + (num.node [0] +num_node [1] +num_node [2] ) ; 
input_acty_ptr = output_acty_ptr - 1; 
theta.ptr = &theta_gradient [num_theta-l] ; 

for (layer_num = 2;layer_num>-l;layer_num--){ 
for (j=0; j <num_node [layer.num + l];j++){ 
temp = 0.0; 
offset = 0; 

for (i=0; i<num_node [layer_num+2] ; i++){ 
temp += (*weight_ptr) * (*gradient_ptr) ; 
weight.ptr -= num.node [layer _num+l] ; 
gradient_ptr -= num_node [layer _num+l] ; 

> 

offset = (num.node [layer _num+2]*num_node [layer_num+l] )-l ; 

weight.ptr += offset; 

gradient _ptr += offset; 

temp *= (1.0 - (*output_acty_ptr--) ) ; 

for (i=0 ; i<num_node [layer.num] ; i++){ 

(*result_gradient_ptr--) = temp * (*input_acty_ptr--) ; 

} 

*theta_ptr-- = (-temp) ; 
input_acty_ptr += num.node [layer _num] ; 

> 

input_acty_ptr -= num.node [layer.num] ; 

} 

return; 
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> 

/* FUNCTION UPDATE.WEIGHTS */ 

void update .weights (double weight [] .double alpha, double direction!!] , 
int num.inputs) 

{ 

int i; 

for (i=0; i<num_inputs ; i++H 

weight[i] += alpha*direction[i] ; 

> 

return; 

> 

/********************************************************************/ 
/* FUNCTION CALC.BETA */ 

double calc_beta(double old.gradient [] .double old_theta_gradient [] , 
double new.gradient [] .double new_theta_gradient [] , 
int num. input s , int num.theta) 

int i; 

double beta, tempi ,temp2 ; 

tempi = 0.0; 
temp2 = 0.0; 

for (i=0; i<num_inputs ; i++){ 

tempi += ( (new.gradient [i] -old.gradient [i] ) *new_gradient [i] ) ; 
temp2 += (old_gradient[i] * old.gradient [i] ) ; 

> 

for (i=0; i<num_theta; i++){ 

tempi += ( (new_theta_gradient [i] -old_theta_gradient [i] ) * 
new_theta_gradient [i] ) ; 

temp2 += (old_theta_gradient [i] * old_theta_gradient [i] ) ; 

> 

beta = templ/temp2; 
if (beta < 0.0){ 



beta = 0.0; 

> 

return(beta) ; 




/* FUNCTION UPDATE. DIRECTION */ 



void update_direction(double gradient [], double direction!!] , 
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double beta, int num.inputs) 



int i ; 

for (i=0;i<num.inputs;i++){ 
direction [i] *= beta; 
direction [i] -= gradient [i] ; 

> 

return; 

> 

/* FUNCTION CALC.ALPHA */ 

double calc_alpha(double weight [] .double direction [] .double theta [] , 
double theta_direction[] .double activity[], 
double data.array [] , int array .size, int num_node[] , 
int num.weights , int num.theta) 

< 

double a.b , lamda, mu , lamda.result .mu.result .desired.result , epsilon; 
double actual.result .test.weight [500] ,test_theta[50] , *array_ptr; 
int i,k,set_num,max_steps; 

a = 0.0; 
b = 10.0; 
max.steps = 16; 
epsilon = 0.001 ; 

lamda = a+((b-a)*f ibon(max_steps-2) /f ibon (max.steps) ) ; 
mu = a+((b-a)*fibon(max_steps-l)/fibon(max_steps)) ; 
a -= lamda; 
b -= lamda; 
mu -= lamda; 
lamda = 0.0; 

load.values (weight , test.weight .num.weights) ; 
load. values (theta, test .theta, num.theta) ; 

update.weights (test.weight .lamda, direction, (num.weights- 1) ) ; 
updat e_weights( test .theta, lamda, theta.direct ion, num.theta) ; 
lamda.result = 0.0; 
array _ptr = data.array; 

for (set_num=0; set_num<array_size ; set_num++) { 
for (i=0;i<num.node[0] ;i++){ 
activity [i] = *array_ptr++ ; 

> 

desired.result = *array_ptr++; 

actual _result=f ire .neurons (activity .test.weight , test .theta , 
num.node) ; 
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actual.result -= desired. result ; 
actual.result *= actual.result; 
lamda.result += actual.result; 

> 

load.values (weight .test .weight .num.weights) ; 
load_values(theta,test_theta,nuin_theta) ; 
update_weights(test .weight , mu, direction, (num_weights-l) ) ; 
update.weights (test .theta, mu, theta.direction .num.theta) ; 
mu.result = 0.0; 
array _ptr = data.array; 

for (set _num=0; set _num<array .size; set _num++){ 
for (i=0 ; i<num_node[0] ; i++){ 
activity [i] = *array_ptr++ ; 

> 

desired.result = *array_ptr++ ; 

actual_result=f ire_neurons(activity .test.weight .test.theta , 
num.node) ; 

actual.result -= desired.result; 
actual.result *= actual.result; 
mu.result += actual.result; 

> 

for (k=l ; (k< (max .steps- 1 ) )M(b>0 . 0) ;k++){ 
if (lamda.result > mu_result){ 
a = lamda ; 
lamda = mu; 

lamda.result = mu.result; 
mu = ((b-a)/f ibon(max.steps-k) ) ; 
mu *= f ibon(max.steps-k-l) ; 
mu += a; 

load.values (weight , test.weight .num.weights) ; 
load. values (theta .test .theta, num.theta) ; 

update.weights (test.weight , mu, direct ion, (num. weight s-1)) ; 
updat e_ weights (test .theta , mu , theta.direction, num.theta) ; 
mu.result = 0.0; 
array _ptr = data.array; 

for (set_num=0 ; set _num<array .size ; set_num++){ 
for (i=0; i<num_node[0] ; i++){ 
activity [i] = *array_ptr++ ; 

> 

desired.result = *array_ptr++ ; 

actual _result=fire_neurons( activity .test.weight .test.theta , 
num.node) ; 

actual.result -= desired.result; 
actual.result *= actual.result; 
mu.result += actual.result; 



> 



} 

else{ 
b = mu; 
mu = lamda; 

mu.result = lamda.result ; 
lamda = ((b-a)/f ibon(max_steps-k) ) ; 
lamda *- f ibon(max_steps-k-2) ; 
lamda += a; 

load.values (weight .test.weight .num.weights) ; 
load.values (theta , test .theta .num.thet a) ; 

update_weights(test_weight .lamda, direction, (num_weights-l) ) ; 
update_weights(test_theta,lamda,theta_direction,num_theta) ; 
lamda.result = 0.0; 
array _ptr = data.array; 

for (set_num=0 ; set_num< array .size ; set_num++){ 
for (i=0; i<num_node[0] ; i++){ 
activity [i] = *array_ptr++ ; 

} 

desired.result = *array_ptr++ ; 

actual _result=fire_neurons (activity .test .weight .test .theta , 
num.node) ; 

actual.result -= desired.result; 
actual.result *= actual.result; 
lamda.result += actual.result; 

> 

> 

> 

if (b>0.0){ 
mu = lamda + epsilon; 

load.values (weight .test.weight .num.weights) ; 
load.values (theta, test.theta.num.theta) ; 
update.weights (test.weight, mu, direction, (num_weights-l) ) ; 
updat e.weights (test .theta, mu, theta.direct ion, num.theta) ; 
mu.result = 0.0; 
array.ptr = data.array; 

for (set _num=0 ; s et _num< array _s ize ; s et _num+ + H 
for (i=0 ; i<num_node [0] ; i++){ 
activity [i] = *array_ptr++; 

} 

desired.result = *array_ptr++ ; 

actual_result=f ire.neurons (activity .test.weight .test .theta, 
num.node) ; 

actual.result -= desired.result; 
actual.result *= actual.result; 






mu.result += actual.result ; 

> 

if (lamda.result > mu_result){ 
if ((lamda+b)> O.OH 
return((lamda+b)/2 .0) ; 

> 

else{ 
return(O.O) ; 

> 

> 

else{ 

if ((lamda+a)> 0.0){ 
return ((lamda+a)/2.0) ; 

> 

else{ 
return(0 .0) ; 

> 

> 

> 

else-C 

return(0 .0) ; 

> 

> 



*/ 



/* FUNCTION LOAD.VALUES */ 

void load_values(double * input _ptr .double *output_ptr , int total_num) 

{ 

int i ; 



for (i=0; i<total_num; i++){ 

*output_ptr++ = *input_ptr++ ; 

> 

return; 

> 

/* FUNCTION FIBON */ 

int fibon(int n) 

{ 

int f0,fl,f2,k; 
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f 2=f l=f 0=1 ; 
if (n < 2){ 
return(l) ; 



> 

for (k=l ;k<n;k++H 
fO = fl + f 2 ; 
f 2 = fl; 
fl = fO; 

> 

return(fO) ; 

> 

/********************************************************************/ 
/* FUNCTION WRITE.RESULT */ 

void write.result (double weight [] .double theta[],int num.node [] , 
double ts_data[],int set.size) 

{ 

FILE *fileptr; 
char f name [14]; 
int i,set_num; 

double desired_result , result , activity [50] , *array_ptr ; 

printf("\n\n Where do you want the results stored? "); 
f lushall () ; 
gets (&f name [0] ) ; 

printf("\n ** Calculating final results ** \n"); 
fileptr = f open(&fname [0] , "w") ; 
array.ptr = ts.data; 

for (set_num=0 ; set _num< set _ size; set _num++){ 
for (i=0 ; i<num_node [0] ; i++){ 
activity [i] = *array_ptr++ ; 

> 

desired_result = *array_ptr++ ; 

result = f ire.neurons (activity .weight, theta, num.node) ; 
fprintf (fileptr , " '/,e '/,e \n" ,desired_result .result) ; 

> 

f close(f ileptr) ; 
return; 

> 

/if*******************************************************************/ 

/* FUNCTION MAP.NETWORK */ 

/********************************************************************/ 
void map .network (double weight [] .double theta [], int num_node[]) 

{ 

int row, col; 

double result, input l,input2, activity [50] ; 

FILE *fileptr; 
char f name [13]; 
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printf("\n\n Where do you want the map matrix stored? "); 
flushallO ; 
gets (&f name [0] ) ; 

printf("\n ** Calculating map of network **\n"); 
fileptr = fopen(&fname[0] , "w") ; 
input l=input2=0 . 0 ; 
for (row=0;row<21;row++){ 
for (col=0;col<2l ;col++){ 
activity [0] =inputl ; 
activity [l]=input2; 

result=f ire.neurons (activity .weight ,theta,num_node) ; 
fprintf (fileptr , " '/,e" .result) ; 
input 1 += 0.05; 

> 

fprintf (fileptr, "\n") ; 
input 1 = 0.0; 
input2 += 0.05; 

> 

f close(f ileptr) ; 
return ; 

> 

/* FUNCTION STORE.WEIGHTS 



*/ 






void store_weights (double weight [] .double *theta_ptr , int num_node[]) 
{ 



int i, j ,k; 

double *weight_ptrl . »weight_ptr2 ; 
char f name [13]; 

FILE *f ileptr ; 



printf ("\n\n Where do you want the final weight/theta values stored? "); 
flushallO ; 
gets (ftf name [0]) ; 

printf ("\n ** Storing final weight/theta values **\n") ; 
fileptr = fopen(&fname[0] , "w") ; 
for (i=0;i<3;i++){ 

fprintf (fileptr , "'/,4d" .num.node [i] ) ; 

> 

fprintf (fileptr, "\n") ; 
weight _ptr2 = weight; 
for (i=0; i<3 ; i++){ 

weight.ptrl = weight_ptr2; 
for (j=0; j<num_node[i] ; j++){ 
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weight.ptrl = weight_ptr2 + j ; 
for (k=0;k<num_node [i+1] ;k++){ 
fprintf (fileptr,"*/,10.61f ", *weight_ptrl) ; 
weight _ptrl += num.node [i] ; 

> 

fprintf (fileptr," \n"); 

> 

weight _ptr2 += (num.node [i] *num_node [i+l] ) ; 
for ( j =0 ; j<num_node[i+l] ; j++){ 

fprintf (f ileptr, "'/,10.61f ", *theta_ptr++) ; 

> 

fprintf (fileptr , " \n") ; 

> 

fprintf (fileptr, "7,10. 61f \n" , *weight_ptr2) ; 

fclose(f ileptr) ; 

return; 
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